Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Large multi-modal models inevitably decay over time as facts update and previously learned information becomes outdated. Traditional approaches such as fine-tuning are often impractical for updating these models due to their size and complexity. Instead, direct knowledge editing within the models presents a more viable solution. Current model editing techniques, however, typically overlook the unique influence ranges of different facts, leading to compromised model performance in terms of both generality and locality. To address this issue, we introduce the concept of the generality-locality trade-off in multi-modal model editing. We develop a new model editing dataset named OKEDIT, specifically designed to effectively evaluate this trade-off. Building on this foundation, we propose BalancEdit, a novel method for balanced model editing that dynamically achieves an optimal balance between generality and locality. BalancEdit utilizes a unique mechanism that generates both positive and negative samples for each fact to accurately determine its influence scope and incorporates these insights into the model's latent space using a discrete, localized codebook of edits, without modifying the underlying model weights. To our knowledge, this is the first approach explicitly addressing the generality-locality trade-off in multi-modal model editing. Our comprehensive results confirm the effectiveness of BalancEdit, demonstrating minimal trade-offs while maintaining robust editing capabilities.more » « lessFree, publicly-accessible full text available July 14, 2026
-
Recent studies have uncovered a troubling vulnerability in the fine-tuning stage of large language models (LLMs): even fine-tuning on entirely benign datasets can lead to a significant increase in the harmfulness of LLM outputs. Building on this finding, our red teaming study takes this threat one step further by developing a more effective attack. Specifically, we analyze and identify samples within benign datasets that contribute most to safety degradation, then fine-tune LLMs exclusively on these samples. We approach this problem from an outlier detection perspective and propose Self-Inf-N, to detect and extract outliers for fine-tuning. Our findings reveal that fine-tuning LLMs on 100 outlier samples selected by Self-Inf-N in the benign datasets severely compromises LLM safety alignment. Extensive experiments across seven mainstream LLMs demonstrate that our attack exhibits high transferability across different architectures and remains effective in practical scenarios. Alarmingly, our results indicate that most existing mitigation strategies fail to defend against this attack, underscoring the urgent need for more robust alignment safeguards.more » « lessFree, publicly-accessible full text available July 14, 2026
-
Domain adaptation addresses the challenge where the distribution of target inference data differs from that of the source training data. Recently, data privacy has become a significant constraint, limiting access to the source domain. To mitigate this issue, Source-Free Domain Adaptation (SFDA) methods bypass source domain data by generating source-like data or pseudo-labeling the unlabeled target domain. However, these approaches often lack theoretical grounding. In this work, we provide a theoretical analysis of the SFDA problem, focusing on the general empirical risk of the unlabeled target domain. Our analysis offers a comprehensive understanding of how representativeness, generalization, and variety contribute to controlling the upper bound of target domain empirical risk in SFDA settings. We further explore how to balance this trade-off from three perspectives: sample selection, semantic domain alignment, and a progressive learning framework. These insights inform the design of novel algorithms. Experimental results demonstrate that our proposed method achieves state-of-the-art performance on three benchmark datasets--Office-Home, DomainNet, and VisDA-C--yielding relative improvements of 3.2%, 9.1%, and 7.5%, respectively, over the representative SFDA method, SHOT.more » « lessFree, publicly-accessible full text available June 11, 2026
-
Diffusion models are vulnerable to backdoor attacks, where malicious attackers inject backdoors by poisoning certain training samples during the training stage. This poses a significant threat to real-world applications in the Model-as-a-Service (MaaS) scenario, where users query diffusion models through APIs or directly download them from the internet. To mitigate the threat of backdoor attacks under MaaS, black-box input-level backdoor detection has drawn recent interest, where defenders aim to build a firewall that filters out backdoor samples in the inference stage, with access only to input queries and the generated results from diffusion models. Despite some preliminary explorations on the traditional classification tasks, these methods cannot be directly applied to the generative tasks due to two major challenges: (1) more diverse failures and (2) a multi-modality attack surface. In this paper, we propose a black-box input-level backdoor detection framework on diffusion models, called UFID. Our defense is motivated by an insightful causal analysis: Backdoor attacks serve as the confounder, introducing a spurious path from input to target images, which remains consistent even when we perturb the input samples with Gaussian noise. We further validate the intuition with theoretical analysis. Extensive experiments across different datasets on both conditional and unconditional diffusion models show that our method achieves superb performance on detection effectiveness and run-time efficiency.more » « lessFree, publicly-accessible full text available April 11, 2026
-
Deep Neural Networks (DNNs) are known to be vulnerable to backdoor attacks, where attackers can inject hidden backdoors during the training stage. This poses a serious threat to the Model-as-a-Service setting, where downstream users directly utilize third-party models (e.g., HuggingFace Hub, ChatGPT). To this end, we study the inference-stage black-box backdoor detection problem in the paper, where defenders aim to build a firewall to filter out the backdoor inputs in the inference stage, with only input samples and prediction labels available. Existing investigations on this problem either rely on strong assumptions on types of triggers and attacks or suffer from poor efficiency. To build a more generalized and efficient method, we first provide a novel causality-based lens to analyze heterogeneous prediction behaviors for clean and backdoored samples in the inference stage, considering both sample-specific and sample-agnostic backdoor attacks. Motivated by the causal analysis and do-calculus in causal inference, we introduce Black-box Backdoor detection under the Causality Lens (BBCaL) which distinguishes backdoor and clean samples by analyzing prediction consistency after progressively constructing counterfactual samples. Theoretical analysis also sheds light on the effectiveness of the BBCaL. Extensive experiments on three benchmark datasets validate the effectiveness and efficiency of our method.more » « lessFree, publicly-accessible full text available December 24, 2025
-
Free, publicly-accessible full text available January 22, 2026
-
Anti-backdoor learning, aiming to train clean models directly from poisoned datasets, serves as an important defense method for backdoor attack. However, existing methods usually fail to recover backdoored samples to their original, correct labels and suffer from poor generalization to large pre-trained models due to its non end-to end training, making them unsuitable for protecting the increasingly prevalent large pre-trained models. To bridge the gap, we first revisit the anti-backdoor learning problem from a causal perspective. Our theoretical causal analysis reveals that incorporating both images and the associated attack indicators preserves the model's integrity. Building on the theoretical analysis, we introduce an end-to-end method, Mind Control through Causal Inference (MCCI), to train clean models directly from poisoned datasets. This approach leverages both the image and the attack indicator to train the model. Based on this training paradigm, the model’s perception of whether an input is clean or backdoored can be controlled. Typically, by introducing fake non-attack indicators, the model perceives all inputs as clean and makes correct predictions, even for poisoned samples. Extensive experiments demonstrate that our method achieves state-of-the-art performance, efficiently recovering the original correct predictions for poisoned samples and enhancing accuracy on clean samples.more » « lessFree, publicly-accessible full text available January 22, 2026
-
Causality lays the foundation for the trajectory of our world. Causal inference (CI), which aims to infer intrinsic causal relations among variables of interest, has emerged as a crucial research topic. Nevertheless, the lack of observation of important variables (e.g., confounders, mediators, exogenous variables, etc.) severely compromises the reliability of CI methods. The issue may arise from the inherent difficulty in measuring the variables. Additionally, in observational studies where variables are passively recorded, certain covariates might be inadvertently omitted by the experimenter. Depending on the type of unobserved variables and the specific CI task, various consequences can be incurred if these latent variables are carelessly handled, such as biased estimation of causal effects, incomplete understanding of causal mechanisms, lack of individual-level causal consideration, etc. In this survey, we provide a comprehensive review of recent developments in CI with latent variables. We start by discussing traditional CI techniques when variables of interest are assumed to be fully observed. Afterward, under the taxonomy of circumvention and inference-based methods, we provide an in-depth discussion of various CI strategies to handle latent variables, covering the tasks of causal effect estimation, mediation analysis, counterfactual reasoning, and causal discovery. Furthermore, we generalize the discussion to graph data where interference among units may exist. Finally, we offer fresh aspects for further advancement of CI with latent variables, especially new opportunities in the era of large language models (LLMs).more » « less
An official website of the United States government
